Exercises and objectives

The objectives of the exercises of this assignment are:
1) Download and organise the data and model and plot staircase responses based on fits of logistic functions
2) Fit multilevel models for response times
3) Fit multilevel models for count data

Exercise 1

Data used in the assignment: https://osf.io/ecxsj/files/ (experiment 2)

Data is associated with Experiment 2 of the article at the following DOI: https://doi.org/10.1016/j.concog.2019.03.007

1) Put the data from all subjects into a single data frame

df <- read_bulk(directory = "experiment_2",
                           fun = read_csv)

2) Describe the data and construct extra variables from the existing variables

i. add a variable to the data frame and call it correct (have it be a logical variable). Assign a 1 to each row where the subject indicated the correct answer and a 0 to each row where the subject indicated the incorrect answer (Hint: the variable obj.resp indicates whether the subject answered “even”, e or “odd”, o, and the variable target_type indicates what was actually presented.

df$correct <- ifelse((df$target.type == "odd" & df$obj.resp == "o") | (df$target.type == "even" & df$obj.resp == "e"), 1, 0)

ii. describe what the following variables in the data frame contain. For each of them, indicate and argue for what class they should be classified into, e.g. factor, numeric etc.

str(df)
## 'data.frame':    18131 obs. of  19 variables:
##  $ trial.type     : chr  "staircase" "staircase" "staircase" "staircase" ...
##  $ pas            : num  4 4 4 4 4 4 4 4 4 4 ...
##  $ trial          : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ jitter.x       : num  -0.3425 0.0623 -0.4058 -0.3615 0.2891 ...
##  $ jitter.y       : num  0.4489 0.0291 0.4995 -0.2224 0.4132 ...
##  $ odd.digit      : num  9 9 7 7 7 7 7 7 9 9 ...
##  $ target.contrast: num  1 1 0.9 0.9 0.8 0.8 0.5 0.5 0.3 0.3 ...
##  $ target.frames  : num  3 3 3 3 3 3 3 3 3 3 ...
##  $ cue            : num  29 29 29 29 29 29 29 29 29 29 ...
##  $ task           : chr  "pairs" "pairs" "pairs" "pairs" ...
##  $ target.type    : chr  "odd" "even" "even" "odd" ...
##  $ rt.subj        : num  1.754 1.39 0.686 0.653 0.582 ...
##  $ rt.obj         : num  1.158 2.456 0.951 0.62 0.95 ...
##  $ even.digit     : num  8 4 4 4 4 4 4 4 4 8 ...
##  $ seed           : num  9817 9817 9817 9817 9817 ...
##  $ obj.resp       : chr  "o" "e" "e" "o" ...
##  $ subject        : chr  "001" "001" "001" "001" ...
##  $ File           : chr  "001.csv" "001.csv" "001.csv" "001.csv" ...
##  $ correct        : num  1 1 1 1 1 1 1 1 1 1 ...
df$pas <- as.factor(df$pas)
df$cue <- as.factor(df$cue)
df$correct <- as.factor(df$correct)
df$even.digit <- as.factor(df$even.digit)
df$subject <- as.numeric(df$subject) # for analysis later
df$subject <- as.factor(df$subject)

df <- df %>%
  select(trial.type, pas, trial, target.contrast, cue, task, target.type, rt.subj, rt.obj, obj.resp, subject, correct)

trial.type : The different types of trials pas : Indicates how clearly the subject felt to have seen the stimulus, (4 categories: from 1 being not seen stimuli to 4 seeing in clearly). trial : Trial number for each participant target.contrast : The contrast of the low-contrast target digit presented between cue and target cue : The cue presented in the beginning of the trial (changing after 12 trials) task : Indicating the amount of digits (e.g. singles: 2:4, pairs: 24:57, quadrouplet: 2468:3579) target.type : Whether the target shown was an even or odd number rt.subj : Reaction time for subject’s response (seconds) obj.resp : Whether the subject indicated that the target was even (e) or odd (o) subject : Unique identifier for each participant correct : Logical argument indicating whether the subject indicated the correct answer (1 = correct, 0 = incorrect)

iii. for the staircasing part only, create a plot for each subject where you plot the estimated function (on the target.contrast range from 0-1) based on the fitted values of a model (use glm) that models correct as dependent on target.contrast. These plots will be our no-pooling model. Comment on the fits - do we have enough data to plot the logistic functions?

df_f <- df %>%
  filter(trial.type == "staircase") 

inv.logit <- function(x) exp(x) / (1 + exp(x))
for (i in 1:29) {
  
  df_s <- df_f %>%
    filter(subject == i)
  
  model <- glm(correct ~ target.contrast, data = df_s, family=binomial)
  
  df_s <- df_s %>%
    mutate(inv = inv.logit(model$fitted.values))
  
  plot <- ggplot(df_s, aes(x = target.contrast, y = inv)) +
    geom_point() +
    labs(title = "Estimated function (inverse logit)")
  print(plot)
}

Looking at the plots, the estimated functions do not look very good. It doesn’t look like there is enough data.

iv. on top of those plots, add the estimated functions (on the target.contrast range from 0-1) for each subject based on partial pooling model (use glmer from the package lme4) where unique intercepts and slopes for target.contrast are modelled for each subject

model_2 <- lme4::glmer(correct ~ target.contrast + (1 | subject), data = df_f, family = binomial)

summary(model_2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ target.contrast + (1 | subject)
##    Data: df_f
## 
##      AIC      BIC   logLik deviance df.resid 
##   5999.0   6018.9  -2996.5   5993.0     5600 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.6646  0.1464  0.5603  0.5902  0.7214 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.03959  0.199   
## Number of obs: 5603, groups:  subject, 29
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.88538    0.06121  14.464  < 2e-16 ***
## target.contrast  3.20846    0.40214   7.978 1.48e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## trgt.cntrst -0.538
# for (i in 1:29) {
  
#  df_s <- df_f %>%
#    filter(subject == i)
  
#  model <- glm(correct ~ target.contrast, data = df_s, family=binomial)
  
#  plot <- ggplot(new_df, aes(x = target.contrast, y = model$fitted.values)) + # here without the inverse logit, because the partial pooling estimate is neither in inverse logit???
#    geom_point() +
#    geom_point(...) or geom_abline(...) - adding an extra layer from the model 2 
#   print(plot)
  
# }
df_f <- df_f %>%
  mutate(inv = inv.logit(fitted.values(model_2)))
  
ggplot(df_f, aes(x = target.contrast, y = inv, color = subject)) +
  geom_point() +
  theme_minimal() +
  labs(title = "Fitted values for each subject (partial pooling)") +
  facet_wrap(~subject)

ggplot(df_f, aes(x = target.contrast, y = inv, color = subject)) +
  geom_point() +
  theme_minimal() +
  labs(title = "Overall fitted values (all subjects in same plot)")

NB! After a lot of research, I still wasn’t able to find a solution with the ggplot2 package to add a new layer to the already existing plot with parameters from a separate model. I do understand the goal of the exercise and I an intuition about it, however, I was not fully able to succeed in overlaying. I am not confident in other plot tools and packages to know their syntax and how to reach the goal of combining these two. I have chosen to prioritize learning exactly when I have the time to fully understand how to implement it.

v. in your own words, describe how the partial pooling model allows for a better fit for each subject

When making a no pooling model, one is not able to draw any generalizations. Each fit is based on a specific subject. With partial pooling, one is able to draw the bigger picture, however at the same time, still able to take individual differences into account. Here, the fit is not only based on the subject, but the “grand mean” is als taken into account, and this allow us to a greater extent to make generalizations from the model.


Exercise 2

Now we only look at the experiment trials (trial.type)

1) Pick four subjects and plot their Quantile-Quantile (Q-Q) plots for the residuals of their objective response times (rt.obj) based on a model where only intercept is modelled

df_e <- df %>%
  filter(trial.type == "experiment") 

df_s1 <- df_e %>%
  filter(subject == 1)

model_s1 <- glm(rt.obj~1, data=df_s1)

df_s6 <- df_e %>%
  filter(subject == 6)

model_s6 <- glm(rt.obj~1, data=df_s6)


df_s11 <- df_e %>%
  filter(subject == 11)

model_s11 <- glm(rt.obj~1, data=df_s11)


df_s15 <- df_e %>%
  filter(subject == 15)

model_s15 <- glm(rt.obj~1, data=df_s15)
qqnorm(residuals(model_s1))
qqline(residuals(model_s1))

qqnorm(residuals(model_s6))
qqline(residuals(model_s6))

qqnorm(residuals(model_s11))
qqline(residuals(model_s11))

qqnorm(residuals(model_s15))
qqline(residuals(model_s15))

i. comment on these

The visual inspection shows that no of the residuals of the subjects’ objective response times look normally sitributed. They all indicate positively skewed residuals.

ii. does a log-transformation of the response time data improve the Q-Q-plots?

model_s1_log <- glm(log(rt.obj)~1, data=df_s1)

model_s6_log <- glm(log(rt.obj)~1, data=df_s6)

model_s11_log <- glm(log(rt.obj)~1, data=df_s11)

model_s15_log <- glm(log(rt.obj)~1, data=df_s15)
qqnorm(residuals(model_s1_log))
qqline(residuals(model_s1_log))

qqnorm(residuals(model_s6_log))
qqline(residuals(model_s6_log))

qqnorm(residuals(model_s11_log))
qqline(residuals(model_s11_log))

qqnorm(residuals(model_s15_log))
qqline(residuals(model_s15_log))

Yes. Log-transformation seems to improve the Q-Q-plots to a great extent. The Q-Q-plots still seem to have a positive skew, however, it looks much smaller now than before.

2) Now do a partial pooling model modelling objective response times as dependent on task (set REML=FALSE in your lmer-specification)

i. which would you include among your random effects and why? (support your choices with relevant measures, taking into account variance explained and number of parameters going into the modelling)

In the model, subject is included as a random intercept. Because of the experiment being a repeated measures design, one cannot get by without accounting for subject, because that would violate the assumption of independence.

model_2.2 <- lmer(rt.obj ~ task + (1 | subject), data = df_e, REML = FALSE)
summary(model_2.2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: rt.obj ~ task + (1 | subject)
##    Data: df_e
## 
##      AIC      BIC   logLik deviance df.resid 
##  61940.2  61977.4 -30965.1  61930.2    12523 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
##  -0.630  -0.155  -0.072   0.051 101.443 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 0.1147   0.3386  
##  Residual             8.1739   2.8590  
## Number of obs: 12528, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)     1.12008    0.07689  14.568
## taskquadruplet -0.15325    0.06257  -2.449
## tasksingles    -0.19154    0.06257  -3.061
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr
## taskqudrplt -0.407       
## tasksingles -0.407  0.500

ii. explain in your own words what your chosen models says about response times between the different tasks

The model shows that reaction time is highest for pairs (intercept), β = 1.12008, SE = 0.07689, t = -3.618. Quadruplet decreases the reaction time by 0.15325, while singles task decreases reaction time by -0.19154, thus having the lowest reaction time.


3) Now add pas and its interaction with task to the fixed effects

model_2.2_int <- lmer(rt.obj ~ task*pas + (1 | subject), data = df_e, REML = FALSE)

summary(model_2.2_int)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: rt.obj ~ task * pas + (1 | subject)
##    Data: df_e
## 
##      AIC      BIC   logLik deviance df.resid 
##  61917.4  62021.5 -30944.7  61889.4    12514 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
##  -0.667  -0.152  -0.064   0.047 101.556 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 0.09749  0.3122  
##  Residual             8.14982  2.8548  
## Number of obs: 12528, groups:  subject, 29
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1.33526    0.09786  13.644
## taskquadruplet      -0.32291    0.10682  -3.023
## tasksingles         -0.21563    0.11555  -1.866
## pas2                -0.13942    0.11496  -1.213
## pas3                -0.33591    0.13139  -2.557
## pas4                -0.57946    0.13410  -4.321
## taskquadruplet:pas2  0.22731    0.15816   1.437
## tasksingles:pas2     0.12509    0.16607   0.753
## taskquadruplet:pas3  0.26240    0.18217   1.440
## tasksingles:pas3     0.09453    0.18518   0.510
## taskquadruplet:pas4  0.25827    0.17970   1.437
## tasksingles:pas4     0.07097    0.17463   0.406
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng pas2   pas3   pas4   tskq:2 tsks:2 tskq:3
## taskqudrplt -0.562                                                        
## tasksingles -0.517  0.474                                                 
## pas2        -0.553  0.478  0.441                                          
## pas3        -0.499  0.420  0.386  0.425                                   
## pas4        -0.504  0.410  0.373  0.414  0.395                            
## tskqdrplt:2  0.378 -0.677 -0.319 -0.697 -0.282 -0.276                     
## tsksngls:p2  0.368 -0.330 -0.697 -0.672 -0.279 -0.270  0.481              
## tskqdrplt:3  0.328 -0.587 -0.279 -0.280 -0.681 -0.238  0.397  0.194       
## tsksngls:p3  0.327 -0.296 -0.625 -0.276 -0.677 -0.242  0.199  0.437  0.484
## tskqdrplt:4  0.333 -0.595 -0.282 -0.284 -0.249 -0.658  0.402  0.195  0.348
## tsksngls:p4  0.343 -0.314 -0.662 -0.290 -0.255 -0.681  0.211  0.460  0.185
##             tsks:3 tskq:4
## taskqudrplt              
## tasksingles              
## pas2                     
## pas3                     
## pas4                     
## tskqdrplt:2              
## tsksngls:p2              
## tskqdrplt:3              
## tsksngls:p3              
## tskqdrplt:4  0.176       
## tsksngls:p4  0.413  0.507

i. how many types of group intercepts (random effects) can you add without ending up with convergence issues or singular fits?

model_2.2_i <- lmer(rt.obj ~ task*pas + (1 | subject) + (1 | cue) + (1 | trial) + (1 | target.type), data = df_e, REML = FALSE)

summary(model_2.2_i)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: rt.obj ~ task * pas + (1 | subject) + (1 | cue) + (1 | trial) +  
##     (1 | target.type)
##    Data: df_e
## 
##      AIC      BIC   logLik deviance df.resid 
##  61911.8  62038.2 -30938.9  61877.8    12511 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
##  -1.110  -0.151  -0.060   0.052 101.301 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  trial       (Intercept) 0.001579 0.03973 
##  cue         (Intercept) 0.090143 0.30024 
##  subject     (Intercept) 0.097950 0.31297 
##  target.type (Intercept) 0.002198 0.04689 
##  Residual                8.111430 2.84806 
## Number of obs: 12528, groups:  trial, 432; cue, 36; subject, 29; target.type, 2
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1.32179    0.11525  11.469
## taskquadruplet      -0.19687    0.16417  -1.199
## tasksingles         -0.16532    0.12280  -1.346
## pas2                -0.10552    0.11534  -0.915
## pas3                -0.30553    0.13193  -2.316
## pas4                -0.54952    0.13458  -4.083
## taskquadruplet:pas2  0.18823    0.15834   1.189
## tasksingles:pas2     0.08359    0.16647   0.502
## taskquadruplet:pas3  0.23070    0.18245   1.264
## tasksingles:pas3     0.06229    0.18568   0.335
## taskquadruplet:pas4  0.22297    0.17999   1.239
## tasksingles:pas4     0.04867    0.17571   0.277
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng pas2   pas3   pas4   tskq:2 tsks:2 tskq:3
## taskqudrplt -0.306                                                        
## tasksingles -0.426  0.358                                                 
## pas2        -0.472  0.304  0.415                                          
## pas3        -0.427  0.279  0.368  0.429                                   
## pas4        -0.428  0.286  0.358  0.415  0.397                            
## tskqdrplt:2  0.324 -0.433 -0.301 -0.699 -0.286 -0.277                     
## tsksngls:p2  0.314 -0.225 -0.656 -0.674 -0.282 -0.271  0.484              
## tskqdrplt:3  0.282 -0.383 -0.266 -0.284 -0.684 -0.240  0.400  0.197       
## tsksngls:p3  0.281 -0.208 -0.592 -0.280 -0.678 -0.243  0.203  0.440  0.487
## tskqdrplt:4  0.283 -0.399 -0.272 -0.285 -0.252 -0.660  0.403  0.197  0.350
## tsksngls:p4  0.291 -0.220 -0.628 -0.292 -0.257 -0.682  0.213  0.461  0.188
##             tsks:3 tskq:4
## taskqudrplt              
## tasksingles              
## pas2                     
## pas3                     
## pas4                     
## tskqdrplt:2              
## tsksngls:p2              
## tskqdrplt:3              
## tsksngls:p3              
## tskqdrplt:4  0.177       
## tsksngls:p4  0.414  0.510

We managed to add four group intercepts without ending up with convergence issues or singular fits.

ii. create a model by adding random intercepts (without modelling slopes) that results in a singular fit - then use print(VarCorr(<your.model>), comp='Variance') to inspect the variance vector - explain why the fit is singular (Hint: read the first paragraph under details in the help for isSingular)

model_2.2_ii <- lmer(rt.obj ~ task*pas + (1 | subject) + (1 | cue) + (1 | trial) + (1 | target.type) + (1 | pas), data = df_e, REML = FALSE)
## boundary (singular) fit: see ?isSingular
summary(model_2.2_ii)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: rt.obj ~ task * pas + (1 | subject) + (1 | cue) + (1 | trial) +  
##     (1 | target.type) + (1 | pas)
##    Data: df_e
## 
##      AIC      BIC   logLik deviance df.resid 
##  61913.8  62047.6 -30938.9  61877.8    12510 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
##  -1.110  -0.151  -0.060   0.052 101.301 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  trial       (Intercept) 0.001579 0.03973 
##  cue         (Intercept) 0.090149 0.30025 
##  subject     (Intercept) 0.097953 0.31297 
##  pas         (Intercept) 0.000000 0.00000 
##  target.type (Intercept) 0.002199 0.04689 
##  Residual                8.111429 2.84806 
## Number of obs: 12528, groups:  
## trial, 432; cue, 36; subject, 29; pas, 4; target.type, 2
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1.32179    0.11526  11.468
## taskquadruplet      -0.19687    0.16417  -1.199
## tasksingles         -0.16532    0.12280  -1.346
## pas2                -0.10552    0.11534  -0.915
## pas3                -0.30553    0.13193  -2.316
## pas4                -0.54952    0.13458  -4.083
## taskquadruplet:pas2  0.18823    0.15834   1.189
## tasksingles:pas2     0.08359    0.16647   0.502
## taskquadruplet:pas3  0.23070    0.18245   1.264
## tasksingles:pas3     0.06229    0.18568   0.335
## taskquadruplet:pas4  0.22297    0.17999   1.239
## tasksingles:pas4     0.04867    0.17571   0.277
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng pas2   pas3   pas4   tskq:2 tsks:2 tskq:3
## taskqudrplt -0.306                                                        
## tasksingles -0.426  0.358                                                 
## pas2        -0.472  0.304  0.415                                          
## pas3        -0.427  0.279  0.368  0.429                                   
## pas4        -0.428  0.286  0.358  0.415  0.397                            
## tskqdrplt:2  0.324 -0.433 -0.301 -0.699 -0.286 -0.277                     
## tsksngls:p2  0.314 -0.225 -0.656 -0.674 -0.282 -0.271  0.484              
## tskqdrplt:3  0.282 -0.383 -0.266 -0.284 -0.684 -0.240  0.400  0.197       
## tsksngls:p3  0.281 -0.208 -0.592 -0.280 -0.678 -0.243  0.203  0.440  0.487
## tskqdrplt:4  0.283 -0.399 -0.272 -0.285 -0.252 -0.660  0.403  0.197  0.350
## tsksngls:p4  0.291 -0.220 -0.628 -0.292 -0.257 -0.682  0.213  0.461  0.188
##             tsks:3 tskq:4
## taskqudrplt              
## tasksingles              
## pas2                     
## pas3                     
## pas4                     
## tskqdrplt:2              
## tsksngls:p2              
## tskqdrplt:3              
## tsksngls:p3              
## tskqdrplt:4  0.177       
## tsksngls:p4  0.414  0.510
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
print(VarCorr(model_2.2_ii), comp='Variance')
##  Groups      Name        Variance 
##  trial       (Intercept) 0.0015785
##  cue         (Intercept) 0.0901493
##  subject     (Intercept) 0.0979532
##  pas         (Intercept) 0.0000000
##  target.type (Intercept) 0.0021986
##  Residual                8.1114287

?isSingular = “Evaluates whether a fitted mixed model is (almost / near) singular, i.e., the parameters are on the boundary of the feasible parameter space: variances of one or more linear combinations of effects are (close to) zero.” A singular model is detected, when the variance of some effect is (close to) zero or when estimates of correlations that are (almost) exactly -1 or 1. When working with random effects, singularity is relatively easy to detect. In the output, we see that the variance of pas is zero, which means th


iii. in your own words - how could you explain why your model would result in a singular fit?

The model is rather complex with five random intercepts, which at first sight could be a reason for the singularity warning. However, at we see with the output, the singularity is due to pas, and we still get the same message when we remove the four other random intercepts. Here we see that pas does not explain any variance as a random effect, and therefore having it included in the model create a non-singular fit.


Exercise 3

1) Initialise a new data frame, data.count. count should indicate the number of times they categorized their experience as pas 1-4 for each task. I.e. the data frame would have for subject 1: for task:singles, pas1 was used # times, pas2 was used # times, pas3 was used # times and pas4 was used # times. You would then do the same for task:pairs and task:quadruplet

data.count <- df %>% 
  group_by(subject, task, pas) %>% 
  summarise("count" = n())
## `summarise()` regrouping output by 'subject', 'task' (override with `.groups` argument)
data.count$subject <- as.factor(data.count$subject)
data.count$task <- as.factor(data.count$task)
data.count$pas <- as.factor(data.count$pas)
data.count$count <- as.numeric(data.count$count)

data.count
## # A tibble: 340 x 4
## # Groups:   subject, task [87]
##    subject task       pas   count
##    <fct>   <fct>      <fct> <dbl>
##  1 1       pairs      1       109
##  2 1       pairs      2        45
##  3 1       pairs      3         4
##  4 1       pairs      4        12
##  5 1       quadruplet 1       141
##  6 1       quadruplet 2        23
##  7 1       quadruplet 3         3
##  8 1       quadruplet 4         1
##  9 1       singles    1        43
## 10 1       singles    2        87
## # … with 330 more rows

2) Now fit a multilevel model that models a unique “slope” for pas for each subject with the interaction between pas and task and their main effects being modelled

model_3.2.interaction <- lme4::glmer(count~pas*task + (1+pas|subject), 
                                     data = data.count, 
                                     family = poisson, 
                                     glmerControl(optimizer="bobyqa")) #Control added

summary(model_3.2.interaction)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: count ~ pas * task + (1 + pas | subject)
##    Data: data.count
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   3148.4   3232.7  -1552.2   3104.4      318 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3871 -0.7853 -0.0469  0.7550  6.5438 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr             
##  subject (Intercept) 0.3324   0.5765                    
##          pas2        0.3803   0.6167   -0.75            
##          pas3        1.1960   1.0936   -0.84  0.63      
##          pas4        2.3736   1.5407   -0.86  0.42  0.72
## Number of obs: 340, groups:  subject, 29
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          4.03570    0.10976  36.769  < 2e-16 ***
## pas2                -0.02378    0.11963  -0.199 0.842458    
## pas3                -0.51365    0.20718  -2.479 0.013166 *  
## pas4                -0.77292    0.29075  -2.658 0.007853 ** 
## taskquadruplet       0.11490    0.03127   3.674 0.000239 ***
## tasksingles         -0.23095    0.03418  -6.756 1.42e-11 ***
## pas2:taskquadruplet -0.11375    0.04605  -2.470 0.013508 *  
## pas3:taskquadruplet -0.20901    0.05287  -3.954 7.70e-05 ***
## pas4:taskquadruplet -0.21500    0.05230  -4.111 3.94e-05 ***
## pas2:tasksingles     0.19536    0.04830   4.045 5.23e-05 ***
## pas3:tasksingles     0.24299    0.05369   4.526 6.02e-06 ***
## pas4:tasksingles     0.56346    0.05101  11.045  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas2   pas3   pas4   tskqdr tsksng ps2:tskq ps3:tskq
## pas2        -0.742                                                     
## pas3        -0.829  0.613                                              
## pas4        -0.847  0.412  0.703                                       
## taskqudrplt -0.151  0.138  0.080  0.057                                
## tasksingles -0.138  0.126  0.073  0.052  0.484                         
## ps2:tskqdrp  0.102 -0.198 -0.054 -0.039 -0.679 -0.328                  
## ps3:tskqdrp  0.089 -0.082 -0.125 -0.034 -0.592 -0.286  0.402           
## ps4:tskqdrp  0.090 -0.083 -0.048 -0.093 -0.598 -0.289  0.406    0.354  
## ps2:tsksngl  0.098 -0.188 -0.052 -0.037 -0.342 -0.708  0.490    0.203  
## ps3:tsksngl  0.088 -0.080 -0.124 -0.033 -0.308 -0.637  0.209    0.486  
## ps4:tsksngl  0.092 -0.085 -0.049 -0.091 -0.324 -0.670  0.220    0.192  
##             ps4:tskq ps2:tsks ps3:tsks
## pas2                                  
## pas3                                  
## pas4                                  
## taskqudrplt                           
## tasksingles                           
## ps2:tskqdrp                           
## ps3:tskqdrp                           
## ps4:tskqdrp                           
## ps2:tsksngl  0.205                    
## ps3:tsksngl  0.184    0.451           
## ps4:tsksngl  0.507    0.474    0.427

i. which family should be used? The “poisson” family is used when the dependent variable consists of count data, which is the case here.

ii. why is a slope for pas not really being modelled? It’s modeled as a factor, and therefore it is not on a continuum (the pas indicators (1,2,3,4) have to be seen as categorical), which is required for it to be modelled as a “proper slope”.

iii. if you get a convergence error, try another algorithm (the default is the Nelder_Mead) - try (bobyqa) for which the dfoptim package is needed. In glmer, you can add the following for the control argument: glmerControl(optimizer="bobyqa") (if you are interested, also have a look at the function allFit)

This algorithm seems to work, so nothing is changed here.

iv. when you have a converging fit - fit a model with only the main effects of pas and task. Compare this with the model that also includes the interaction.

model_3.2 <- lme4::glmer(count ~ pas + task + (1+pas|subject), data = data.count, family = poisson, glmerControl(optimizer="bobyqa"))

summary(model_3.2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: count ~ pas + task + (1 + pas | subject)
##    Data: data.count
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   3398.5   3459.8  -1683.3   3366.5      324 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.5885 -0.9001 -0.0477  0.8253  6.5100 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr             
##  subject (Intercept) 0.3325   0.5766                    
##          pas2        0.3805   0.6169   -0.75            
##          pas3        1.1895   1.0906   -0.84  0.63      
##          pas4        2.4222   1.5563   -0.86  0.42  0.73
## Number of obs: 340, groups:  subject, 29
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     4.004589   0.108724  36.833   <2e-16 ***
## pas2           -0.006528   0.116616  -0.056   0.9554    
## pas3           -0.509918   0.204452  -2.494   0.0126 *  
## pas4           -0.663832   0.291962  -2.274   0.0230 *  
## taskquadruplet  0.003294   0.018188   0.181   0.8563    
## tasksingles     0.004307   0.018177   0.237   0.8127    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas2   pas3   pas4   tskqdr
## pas2        -0.742                            
## pas3        -0.832  0.623                     
## pas4        -0.850  0.412  0.723              
## taskqudrplt -0.084  0.000  0.001 -0.002       
## tasksingles -0.084  0.000  0.000  0.000  0.501

v. indicate which of the two models, you would choose and why

AIC(model_3.2, model_3.2.interaction)
##                       df      AIC
## model_3.2             16 3398.549
## model_3.2.interaction 22 3148.441
sqrt(sum(residuals(model_3.2)^2))
## [1] 31.02358
sqrt(sum(residuals(model_3.2.interaction)^2))
## [1] 26.44781

The interactions seem to have an significant impact on the independent variables, and they should therefore be kept. Also, the AIC output indicates that the model with the interaction is a better fit.

vi. based on your chosen model - write a short report on what this says about the distribution of ratings as dependent on pas and task.

When pas = 1 and task = pairs, one expects the outcome to be exp(4.03570) ≈ 56.58. According to the model, going from pas 1 to either pas 2, 3 or 4 will result in a decrease of the ratings, however, these predictions are only significant for pas 3 and 4, p < .5.

Going to the quadruplet task, the model predicts an increase in outcome by ≈12%, while going from pairs to singles task, the model predicts a decrease of ≈21%. Both of these estimates are significant, p < .5.

The model shows a decrease for all interactions between passes and the quadruplet tasks, while it shows an increase for all interaction between passes and singles tasks. The model indicates that all interactions are significant, p < .5.

vii. include a plot that shows the estimated amount of ratings for four subjects of your choosing

For this question, we don’t feel that we have the knowledge to know how to approach this question, but here is our best shot of a plot that might be informant.

data.count2 <- rbind(numb_4 <- data.count[which(data.count$subject==4), ],
numb_12 <- data.count[which(data.count$subject==12), ],
numb_19 <- data.count[which(data.count$subject==19), ],
numb_25 <- data.count[which(data.count$subject==25), ])

count_mmmodel_foursubj <- glmer(count~pas*task + (1+pas|subject), data=data.count2, family="poisson")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00961009 (tol = 0.002, component 1)
data.count2$fitted.values <- fitted(count_mmmodel_foursubj)

subject4 <- fitted(count_mmmodel_foursubj)[1:12]
subject12 <- fitted(count_mmmodel_foursubj)[13:22]
subject19 <- fitted(count_mmmodel_foursubj)[23:34]
subject25 <- fitted(count_mmmodel_foursubj)[35:46]

par(mfrow=c(2,2))
barplot(subject4, main="Subject 4 fitted values", xlab="1 = pairs, 2 = quadruplet, 3 = singles", ylab="Fitted values")

barplot(subject12, main="Subject 12 fitted values", xlab="4 = pairs, 5 = quadruplet, 6 = singles", ylab="Fitted values")

barplot(subject19, main="Subject 19 fitted values", xlab="7 = pairs, 8 = quadruplet, 9 = singles", ylab="Fitted values")

barplot(subject25, main="Subject 25 fitted values", xlab="10 = pairs, 11 = quadruplet, 12 = singles", ylab="Fitted values")

Our thought was that the family should be poisson, however, we do not posses enough knowledge about poisson and plotting of these model to be confident in how to plot the subjects. We have tried to plot distributions for each of the subject, however, these are on the basis of four new models and not the model made earlier above. We will therefore take an extra look and improve these plots once we have found a more optimal way of visualizing what the exercise asks for.


3) Finally, fit a multilevel model that models correct as dependent on task with a unique intercept for each subject.

as dataset is not specified for Exercise 3, it is chosen to work with the whole dataset in this part of the Exercise.

model_e3.1 <- lme4::glmer(correct ~ task + (1 | subject), data = df, family = "binomial")
summary(model_e3.1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ task + (1 | subject)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##  19927.2  19958.4  -9959.6  19919.2    18127 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7426 -1.0976  0.5098  0.6101  0.9111 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1775   0.4214  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.10071    0.08387  13.124  < 2e-16 ***
## taskquadruplet -0.09825    0.04190  -2.345    0.019 *  
## tasksingles     0.18542    0.04336   4.276  1.9e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr
## taskqudrplt -0.256       
## tasksingles -0.247  0.495

i. does task explain performance?

Looking at the p-values in the output, it indicates that task does explain performance significantly. However, we can only detect an effect, not the size of the effect.

ii. add pas as a main effect on top of task - what are the consequences of that?

model_e3.2 <- lme4::glmer(correct ~ task + pas + (1 | subject), data = df, family = "binomial")
summary(model_e3.2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ task + pas + (1 | subject)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##  17424.9  17479.5  -8705.5  17410.9    18124 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.4872 -0.6225  0.3240  0.5767  1.6144 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1979   0.4449  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     0.08530    0.09149   0.932    0.351    
## taskquadruplet -0.03055    0.04498  -0.679    0.497    
## tasksingles    -0.01059    0.04688  -0.226    0.821    
## pas2            0.95477    0.04420  21.599   <2e-16 ***
## pas3            1.97709    0.06221  31.780   <2e-16 ***
## pas4            3.12732    0.08626  36.254   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) tskqdr tsksng pas2   pas3  
## taskqudrplt -0.260                            
## tasksingles -0.226  0.489                     
## pas2        -0.213  0.021 -0.040              
## pas3        -0.166  0.030 -0.045  0.356       
## pas4        -0.123  0.016 -0.079  0.257  0.236

Here, pas has a significant effect on performance, where task no longer seems to have a significant effect on performance. Thus, adding pas to the model changes the effect that task has on the performance.

iii. now fit a multilevel model that models correct as dependent on pas with a unique intercept for each subject

model_e3.3 <- lme4::glmer(correct ~ pas + (1 | subject), data = df, family = "binomial")
summary(model_e3.3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ pas + (1 | subject)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##  17421.4  17460.4  -8705.7  17411.4    18126 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -8.5665 -0.6243  0.3244  0.5754  1.6017 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1981   0.4451  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.07044    0.08773   0.803    0.422    
## pas2         0.95575    0.04410  21.671   <2e-16 ***
## pas3         1.97892    0.06201  31.914   <2e-16 ***
## pas4         3.12940    0.08579  36.476   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr) pas2   pas3  
## pas2 -0.223              
## pas3 -0.173  0.352       
## pas4 -0.136  0.253  0.231

iv. finally, fit a model that models the interaction between task and pas and their main effects

model_e3.4 <- lme4::glmer(correct ~ pas*task + (1 | subject), data = df, family = "binomial")
summary(model_e3.4)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct ~ pas * task + (1 | subject)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##  17431.0  17532.4  -8702.5  17405.0    18118 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.9329 -0.6276  0.3186  0.5750  1.6411 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.1987   0.4458  
## Number of obs: 18131, groups:  subject, 29
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.083740   0.095748   0.875    0.382    
## pas2                 0.963210   0.073764  13.058   <2e-16 ***
## pas3                 1.999968   0.103115  19.395   <2e-16 ***
## pas4                 3.049586   0.146008  20.886   <2e-16 ***
## taskquadruplet       0.006492   0.064404   0.101    0.920    
## tasksingles         -0.058967   0.070531  -0.836    0.403    
## pas2:taskquadruplet -0.049301   0.100806  -0.489    0.625    
## pas3:taskquadruplet -0.134143   0.142505  -0.941    0.347    
## pas4:taskquadruplet -0.095786   0.200379  -0.478    0.633    
## pas2:tasksingles     0.040476   0.106074   0.382    0.703    
## pas3:tasksingles     0.079941   0.145506   0.549    0.583    
## pas4:tasksingles     0.296579   0.197154   1.504    0.133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) pas2   pas3   pas4   tskqdr tsksng ps2:tskq ps3:tskq
## pas2        -0.329                                                     
## pas3        -0.243  0.323                                              
## pas4        -0.176  0.224  0.178                                       
## taskqudrplt -0.357  0.464  0.333  0.236                                
## tasksingles -0.323  0.420  0.300  0.207  0.482                         
## ps2:tskqdrp  0.227 -0.704 -0.212 -0.149 -0.639 -0.307                  
## ps3:tskqdrp  0.162 -0.212 -0.696 -0.107 -0.452 -0.218  0.289           
## ps4:tskqdrp  0.114 -0.147 -0.107 -0.700 -0.322 -0.155  0.205    0.145  
## ps2:tsksngl  0.221 -0.676 -0.210 -0.146 -0.321 -0.665  0.488    0.146  
## ps3:tsksngl  0.158 -0.205 -0.681 -0.106 -0.234 -0.485  0.149    0.491  
## ps4:tsksngl  0.115 -0.148 -0.105 -0.707 -0.172 -0.358  0.109    0.078  
##             ps4:tskq ps2:tsks ps3:tsks
## pas2                                  
## pas3                                  
## pas4                                  
## taskqudrplt                           
## tasksingles                           
## ps2:tskqdrp                           
## ps3:tskqdrp                           
## ps4:tskqdrp                           
## ps2:tsksngl  0.102                    
## ps3:tsksngl  0.076    0.324           
## ps4:tsksngl  0.517    0.237    0.174

v. describe in your words which model is the best in explaining the variance in accuracy

sqrt(sum(residuals(model_e3.1)^2))
## [1] 140.726
sqrt(sum(residuals(model_e3.2)^2))
## [1] 131.5165
sqrt(sum(residuals(model_e3.3)^2))
## [1] 131.5183
sqrt(sum(residuals(model_e3.4)^2))
## [1] 131.4935
anova(model_e3.1, model_e3.2, model_e3.3, model_e3.4, test = 'LR')
## Data: df
## Models:
## model_e3.1: correct ~ task + (1 | subject)
## model_e3.3: correct ~ pas + (1 | subject)
## model_e3.2: correct ~ task + pas + (1 | subject)
## model_e3.4: correct ~ pas * task + (1 | subject)
##            npar   AIC   BIC  logLik deviance     Chisq Df Pr(>Chisq)    
## model_e3.1    4 19927 19958 -9959.6    19919                            
## model_e3.3    5 17421 17460 -8705.7    17411 2507.8224  1     <2e-16 ***
## model_e3.2    7 17425 17480 -8705.5    17411    0.4759  2     0.7883    
## model_e3.4   13 17431 17532 -8702.5    17405    5.9376  6     0.4302    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Performing an analysis of variance, it is indicated that model 3 has the lowest AIC values which indicates that it is the better model. Furthermore, by inspection of the summary function, it can be seen that model 3 contains two estimates, fixed effects estimates, which are both highly significant. Only model 1 has fixed effects estimates that are also significant, although not all of the estimates of model are highly significant. Also, the analysis of variance shows that model 3 is significantly better than model 1. Thus, based on these observations, model 3 is chosen as the model which best explains the variance on accuracy.